Break Signal

Purpose

Functions break_signal() and signal_processor() are designed to select the best break of a “signal”, observed differences in the values of an ordered sequence. A “break” is simply a change in sign of the differences: the “best” break is the one that indicates the process that generates of observations has changed state. Consider the following examples to help motivate the line of inquiry:

Example 1: Cold & flu season

Cold and flu season does not have specific beginning and ending dates, however there are generalized times of the year and metrics that serve as heuristics. Autumn and Spring generally serve as the beginning and ending, respectively, of the season with the time in between typically accompanied by increased case counts and geo-spatial concentration which can be successively differenced by some fixed unit such as days or square miles.

Example 2: Rush hour traffic

“Rush hour” does not have specific beginning and end times, but there are generally accepted times of day and characteristics that indicate when “rush hour” is in effect: increased traffic per unit distance coupled with decreased speed per unit time. However, these parameters (traffic and speed) change all the time, so there must be some other heuristic that differentiates “rush hour” from the normal ebb and flow of traffic.

Example 3: Emergency Department (ED) utilization

An individual may demonstrate ED utilization characterized by periods of waxing and waning usage over the course of a year. A review of expenditures for this individual would provide dates of admission and discharge, but rolling those dates up into an “episode” could prove to be a challeng. Defining days between readmissions as the heuristic for identifying a new episode is challenging if the goal is to discover the most likely root cause, especially when generalizing across a population, each individual have their own distribution of ebbs and flows.

Note that in each of the examples, terms such as “normal”, “general”, “typical” are used and provide some intuition of how to indicate a break in the signal.

Hypothesis

The hypothesis is that given a set of ordered values (\(Y\)), possibly being partitioned by a set of other attributes, the sign changes in successive differences in values along \(Y\) (\(\Delta{Y}_i\)) can be used to infer changes of state in the generative process by probabilistic analysis of changes in self-information (\(I\)) and associated values of \(\Delta{Y}_i\) over those changes of state. Self-information is desirable as opposed to entropy because the hypothesis is that “surprising” values of (\(\Delta{Y}_i\)) maximize the likelihood that a change of state has been detected.

[Note: “self-information” will be refered to as simply “information” for convenience.]

Assumptions

Algorithm

(Refer to the section Building Blocks for equation and symbolic definitions.)

Part I

For each \(B_i;\) Eq. 3b; \(\Phi_i\) Eq. 2:

Step 1
Group Information

Calculate the geometric mean of information from the proportionality of \(B^\dagger_i\) (\(\omega_z\) below) conditioned by group (subscript \(i\)) and within-group break cycle \(B_i\) Eq. 3ba:

\[ I_i := \Big[\Pi_{q = 1}^z - \log_2 (\omega_q \enspace |\enspace B = B_i)\Big]^{\frac{1}{z}} , \enspace 0 < \omega_z \le 1 \longrightarrow \big<I_1 \enspace I_2\enspace\dots\enspace I_z\big>_i \\ \enspace \\ \omega_i := \big|\big|\chi_z\big|\big|^{-1} \bigg( {\sum_{q = 1}^z \chi_q} \bigg) \longrightarrow \big<\omega_1 \enspace \omega_2 \enspace \dots \enspace \omega_z \big>_i \\ \enspace \\ \chi_{i} := \beta^\dagger_z \enspace | \enspace (\Phi_q \le \delta_q,\enspace B_i) \Big|_{q = 1}^z \longrightarrow \big<\chi_1\enspace\chi_2\enspace\dots\enspace\chi_z\big>_i \]

Step 2
Information PMF

Separately calculate the geometric (negative binomial with \(r = 1\)) probability mass function (\(g_i\)) over \(P(K_N)\) Eq. 4 by within-group cycle (\(B_i\)):

\[ g_i := p_i (1 - p_i)^{k_i} \longrightarrow \big<g_1\enspace g_2\enspace \dots\enspace g_z\big>_i\\ \enspace \\ p_i:= n^{-1} \sum_{q = 1}^{n = ||\Phi_i||} \Big\{P(K_N) \enspace |\enspace K_N\cap \Phi_i \Big\}_{q\le n} \longrightarrow \big<p_1\enspace p_2\enspace \dots\enspace p_n\big>_i \\ \enspace \\ k_i:= \frac {\sum_{q = 1}^j \phi_{j \le q} } {\sum_{q = 1}^j \phi_q}, \enspace \phi_j \in \Phi_i \longrightarrow \big<x_1\enspace x_2\enspace \dots\enspace x_z\big>_i, \enspace k_i \in B^\dagger_i \]

In the preceding equation, for each within-group cycle Eq. 3a:

  • \(p_i\) is a cumulative proportionality of \(P(K_N)\) conditioned by each \(\Phi_i\)

  • \(k_i\) is a cumulative summation of \(\Phi_i\). Since the \(g_i\) returns the probability of \(k\) attempts before the first success, the cumulative summation of each \(\Phi_i\) serves as a proxy for an “attempt” (which is why the original data must be ordered before calculating \(\Phi\)). In this regard, \(k_i\) operates under a similar context as survival time .

The consequence is that \(g_i\) is a vector of draws from the geometric PMF conditioned by each \(\Phi_i\) and (implicitly) \(B_i\). Over all groups and their respective cycles, what one ends up with is a likelihood space mapped to cycle duration and the sequential breaks within each cycle.

Step 3
Information Measures & Score

Calculate the squared deviations of within-group cycle information \(I_i\) Step 1 from global information \(I_N\) Eq. 5 conditioned on \(\Phi_i\) and the associated scoring vector \(K^\dagger_i\) (defined below):

\[ \Theta_i := \big[(I_N \enspace \big|\enspace K_N \cap \Phi_i \ne \emptyset) - I_i \big]^2 \longrightarrow \big< \theta_1 \enspace \theta_2 \enspace \dots \enspace \theta_n \big>_i \]

\[ K^\dagger_i:= g_i \Theta_i \longrightarrow \big< \kappa_1 \enspace \kappa_2 \enspace \dots \enspace \kappa_n \big>_i \]

Next, calculate the effective slope and curvature of \(\Theta_i\) for each within-group cycle:

\[ \Theta_i^{'} := \frac{d}{dn}\theta_n\enspace\big|\enspace B_i \]

\[ \Theta_i^{''}:= \frac{d^2}{dn}\theta_n\enspace\big|\enspace B_i \]

Part II

Data & Hyper-parameters

Data

At this point, we have the following by group:

\[ \mathbb{D}_{R,C}: D_i \ni \big\{ k ;\enspace B ;\enspace B^\dagger ;\enspace K^\dagger ;\enspace I_N|\enspace \exists K_N\cap\phi ;\enspace I ;\enspace g ;\enspace \Theta ;\enspace \Theta^{'} ;\enspace \Theta^{''} \big\}_i \]

\(R\) and \(C\) are the row and column indices, respectively, for tabular data \(\mathbb{D}\). Moving forward, the groups will be ignored as they are no longer relevant.

Hyper-parameters

Three hyper-parameters are required to produce the final result:

Step 4
Scoring, Mean Info, and Derivatives

\(\mathbb{D}\) is filtered by row where the following is true:

\[ k_i \le k_{max} \enspace\wedge\enspace \Theta^{'} \ge 0\enspace\wedge\enspace \Theta^{''} \le 0 \]

The result is augmented by deriving an information-based score (\(\Omega\)) as well as the weighted-mean information (\(\bar{I}\)) of \(\Theta_i\) conditioned on \(K_N\).

Given q: \(q \in \big\{1, 2, \dots, N\big\}\):

\[ \enspace \\ \Omega^k_R := \Big(K^\dagger \enspace | \enspace \big\{g_i = \text{max}_{g_i},\enspace K_q = k_R \big\} \Big) \longrightarrow \big<\omega_1\enspace\omega_2\enspace\dots\enspace\omega_R\big>^k \] {#eq:break-information-score} \[ \bar I_R^k:= \frac{\sum_{q = 1}^N{g_q}{I_R}}{\sum_{q = 1}^N g_q} \enspace\|\enspace K_q = k_R \longrightarrow \big<i_1\enspace i_2\enspace\dots\enspace i_R\big>^k \]

Note: the superscript \(k\) is an indicator that the metric is indexed by values in \(k\) — it is not an exponent.


Next, the slope and curvature of \(\Omega_R^k\) and \(\bar I_R^k\) with respect \(k\) are derived which maps the scores and mean information to values in \(k\):

\[ \frac{\partial}{\partial{k}} \big\{\Omega_R^k;\enspace \bar I_R^k\big\} \Rightarrow \big\{{\Omega_R^k}^{'};\enspace {{\bar I_R^k}}^{'}\big\} \enspace \\ \frac{\partial}{\partial{k}} \big\{\Omega_R^k;\enspace \bar I_R^k\big\} \Rightarrow \big\{{\Omega_R^k}^{''};\enspace {{\bar I_R^k}}^{''}\big\} \]

Step 5
Within-fold Scores, Optimal and Alternate Breaks

The final scoring vector is defined as follows:

\[ \Gamma_R^k:= -\lambda({\bar I_R^k}^{''}) \times \lambda({\Omega_R^k}^{''}) \enspace\Big|_{q = 1}^R \longrightarrow \big<\gamma^k_1\enspace\gamma^k_2\enspace\dots\enspace\gamma^k_q\big>\enspace\big|_{q=1}^R\\ \lambda:= f(x)\to \frac {x}{x_{max}} \]

The optimal break (\(k\)) is the one that maximizes \(\Gamma_R^k\) by maximizing \({\Omega_R^k}^{''}\) and minimizing \({\bar I_R^k}^{''}\), with alternatives as those values where the score is some distance \(\zeta\) from the maximized \(\Gamma_R^k\):

\[ \hat{k} :=k\enspace|\enspace\gamma_q^k = \Gamma_{max} \] \[ \ddot{k}:=k\enspace|\enspace\gamma_q^k \ge \big(\Gamma_{max}-\zeta\big) \]

Step 6
Final Score, Optimal and Alternate Breaks

Over \(\text{n_folds}\) of generated and scored data, we have \(k_n:=\big<k_1\enspace k_2\enspace \dots k_n \big>\) from which the optimal \(k\) is “greedily” chosen as the most frequently-occurring \(k_n\): \(\text{arg}_{_k}\text{max}\enspace{\#}k_n\big|_{q = 1}^n\).

Implementation

Computing Workspace

Github-hosted R libraries can be installed using remotes::install_github("delriaan/<library>", subdir = "pkg").

Prepared observation data sample:

Data were synthesized to serve as observation data (a sample of which follows):

Prescriptive vs. Tuned

# PRESCRIPTED TIMEOUT ====
prescribed_timeout_islands <- event.vectors::continuity(
        data = obs_data
        , map_fields = c(join_key, src)
        , time_fields = c(date.start, date.end)
        , boundary_name = episode
        , timeout = 10
        , show.all = TRUE
        );

# TUNED TIMEOUT ====
tuned_timeout <- event.vectors::break_signal(
    y = obs_data$date.start
    , grp = obs_data[, paste(join_key, src, sep = ":")]
    , obs_ctrl = list(min_size = 5L, max_k = 120)
    ) |>
    event.vectors::signal_processor(cl_size = 8, nfolds = 5, .debug = !TRUE);
#> [INFO] Created cluster `cl` in the calling environment

tuned_timeout_islands <- event.vectors::continuity(
    data = obs_data
    , map_fields = c(join_key, src)
    , time_fields = c(date.start, date.end)
    , boundary_name = episode
    , timeout = !!tuned_timeout@best_k
    , show.all = TRUE
    );

Tuned Timeout Visualization


Islands Comparison: Prescribed vs. Tuned

In the contour plot below, if we plot the likelihood of mutual deviation between island-generation parameters by generated islands …

Using mutual information between island-generation parameters …


Final Considerations

Two very important pre-algorithm factors about the underlying data directly affect the derivation of optimal break in the signal:

Grouping Feature Selection

The choice of features used to group the original response should reflect the natural grouping of features related to the generative process under study. This is especially the case when the features are ontologically hierarchical, in which case, the level of the hierarchy should be chosen based on domain/subject matter expertise.

For example, a health plan considering claims data may group a cohort by the features member, provider, and service: \(\\big\{X_\alpha;\enspace X_\beta;\enspace X_\gamma\big\}\) yielding \(\alpha\times{\beta}\times{\gamma}= i\) possible groupings. The actual groups are less than this due to the actual experiences of the members ⇒ \(f\big(X_\beta,X_\gamma; X_\alpha\big):=\big\{X_\beta\mapsto X_\gamma \big\}\big|X_\alpha\).

“Spotty” Signal

When the input vector is already characterized by large values of \(\Phi_i\) (resulting in a left-skewed probability mass distribution), the optimal break will be small and likely of little practical value. The reason goes back to one of the earlier-stated assumptions that the cumulative summation of \(\Phi_i\) is geometrically distributed which is clearly right-skewed. Therefore, in addition to grouping feature selection, consideration regarding whether or not the data is appropriate for the use case should be given. A future version of break_signal() will provide some form of heuristic to check for appropriateness, but this is a long-term goal.


Appendix

Building Blocks

Eq. 1: \(X|G\)

Indexed groups derived from Cartesian combinations of selected features:

\[ X|G_i\to\Omega_i \in \mathbb{R}\\ G_i:= \{g_1 \wedge g_2 \wedge \cdots \wedge g_\iota\} \]

Eq. 2: \(\Phi_i\)

Within-group sequential differences:

\[ \Phi_i:=\Delta{X}|G_i \longrightarrow \big<\phi_1 \enspace \phi_2 \enspace \cdots \enspace \phi_j\big>_i \\ \enspace \\ j:= ||X_i||-1\\ \enspace\\ \Phi_i \in \big\{0, 1, 2, \dots, \infty\big\} \]

Eq. 3a: \(B_i\); Eq. 3b: \(B^\dagger_i\)

Within-group break cycles and cycle duration:

\[ \forall \gamma \in \Bigg\{ 1 ,\enspace \Big\{\matrix{ \delta_{j + 1} \ge 0\to \emptyset \\ \delta_{j + 1} < 0 \to j } \Bigg\}_i \\ B_i:= \Big\{ \big\{1, \gamma_1\big\}, \enspace \big\{\gamma_1 + 1, \gamma_2 \big\},\enspace \cdots, \enspace \big\{\gamma_z, j\big\} \Big\}_i \longrightarrow \big<\beta_1 \enspace\beta_2 \enspace\cdots \enspace\beta_z\big>_i \]

\[ B^\dagger_i:= \sum\Phi_i|B_i \]

Note: \(B_i\) may be censored due to measure methodology or lack of available data.

Eq. 4: \(P(K_N)\)

Global difference occurrence probabilities. \(P(K_N)\) is derived as proportion frequency table over unique values in \(\Phi_i\) (#eq-group-differences):

\[ P(K_N):= N^{-1}\sum_{n = 1}^N \sum_{\alpha=1}^i\big|\big| K_n \cap \big<\phi_1 \enspace \phi_2 \enspace \cdots \enspace \phi_j\big>_\alpha \big|\big| \\ \longrightarrow \big<P_1 \enspace P_2 \enspace \dots \enspace P_N \big> \]

, where \(K_N\) is the set of \(N\) unique values \(\forall \Phi\) (essentially, all possible values)

Eq. 5: \(I_N\)

Global information vector in “bits”:

\[ I_N:=-log_2[P(K_N)] \longrightarrow \big<I_1 \enspace I_2 \enspace \dots \enspace I_N \big> \]